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Abstract 



Algorithms that exploit sort orders are widely used to implement joins, grouping, du- 
plicate elimination and other set operations. Query optimizers traditionally deal with sort 
orders by using the notion of interesting orders. The number of interesting orders is unfor- 
tunately factorial in the number of participating attributes. Optimizer implementations use 
heuristics to prune the number of interesting orders, but the quality of the heuristics is un- 
clear. Increasingly complex decision support queries and increasing use of covering indices, 
which provide multiple alternative sort orders for relations, motivate us to better address the 
problem of optimization with interesting orders. 

We show that even a simplified version of optimization with sort orders is NP-hard and 
provide principled heuristics for choosing interesting orders. We have implemented the 
proposed techniques in a Volcano-style cost-based optimizer, and our performance study 
shows significant improvements in estimated cost. We also executed our plans on a widely 
used commercial database system, and on PostgreSQL, and found that actual execution times 
for our plans were significantly better than for plans generated by those systems in several 
cases. 
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1 Introduction 



Decision support queries, extract-transform-load (ETL) operations, data cleansing and integra- 
tion often use complex joins, aggregation, set operations and duplicate elimination. Sorting based 
query processing algorithms for these operations are well known. Sorting based algorithms are 
quite attractive when physical sort orders of one or more base relations fulfill the sort order re- 
quirements of operators either completely or partially. Further, secondary indices that cover a 
querjQ are being increasingly used in read-mostly environments. Query covering indices make it 
very efficient to obtain desired sort orders without accessing the data pages. These factors make 
it possible for sort based plans to significantly outperform hash based counterparts. 

The notion of interesting orders I1SMD"^79B has allowed optimizers to consider plans that 
could be locally sub-optimal, but produce orders that are beneficial for other operators, and thus 
produce a better plan overall. However, the number of interesting orders for most operators 
is factorial in the number of attributes involved. This is not acceptable as queries in the afore 
mentioned applications do contain large number of attributes in joins and set operations. 

In this paper we consider the problem of optimization taking sort orders into consideration. 
We make the following technical contributions: 

1. Often order requirements of operators are partially satisfied by inputs. For instance, con- 
sider a merge-join with join predicate (r.ci = s.ci and r.cj = s.cj). A clustering index on 
r.ci (or on r.C2 or s.Ci or 5.C2) is helpful in getting the desired order efficiently; a secondary 
index that covers the query has the same eff"ect. 

We highlight (in Section|3]) the need for exploiting partial sort orders and show how a minor 
modification to the standard replacement selection algorithm can avoid run generation I/O 
completely when input is known to have a partial sort order. Further, we extend a cost- 
based optimizer to take into account partial sort orders. 

2. We consider operators with flexible order requirements and address the problem of choos- 
ing good interesting orders so that complete or partial sort orders already available from 
inputs can be exploited. 

• In Sections |4] we show that a special case of finding optimal sort orders is NP-hard 
and give a 2-approximation algorithm to choose interesting sort orders for a join tree. 

• In Section |5] we address a more general case of the problem. In many cases, the 
knowledge of indices and available physical operators in the system allows us to 
narrow down the search space to a small set of orders. We formalize this idea (in 
Section 15.11) through the notion of favorable orders, and propose a heuristic to ef- 
ficiently enumerate a small set of promising sort orders. Unlike heuristics used in 
optimizer implementations, our approach takes into account issues such as (0 added 
choices of sort orders for base relations due to the use of query covering indices (zz) 
sort orders that partially match an order requirement (zzz) requirement of same sort 

'i.e., contain all attributes of the relation that are used in the query 
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order from multiple inputs (e.g., merge based join, union) and (zv) common attributes 
between multiple joins, grouping and set operations. 

In Section 15.21 we also show how to integrate our extensions into a cost-based opti- 
mizer. 

3. We present experimental results (in Section (6]) evaluating the benefits of the proposed tech- 
niques. We compare the plans generated by our optimizer with those of three widely used 
database systems and show significant benefits due to each of our optimizations. 

2 Related Work 

Both System R IISMD"^79ll and Volcano [IGM93II optimizers consider plans that could be locally 
sub-optimal but provide a sort order of interest to other operators and thus yield a better plan 
overall. However, the papers assume operators have one or few exact sort orders of interest. 
This is not true of operators like merge-join, merge-union, grouping and duplicate elimination 
that have a factorial number of interesting orders. Heuristics such as the PostgreSQL heuristic 
we shall see, are commonly used by optimizers. Details of the heuristics are publicly available 
only for PostgreSQL. Further, System R and Volcano optimizers consider only those sort orders 
as useful that completely meet an order requirement. Plans that partially satisfy a sort order 
requirement are not handled. In this report we address these two issues. 

The seminal work by Simmen et.al. [ISSM96I describes techniques to infer orders from func- 
tional dependencies and predicates applied and thereby avoids redundant sort enforcers in the 
plan. The paper briefly mentions the problem of non-exact sort order requirements and mentions 
an approach of propagating an order specification that allows any permutation on the attributes 
involved. Though such an approach is possible for single input operators like group-by, it can- 
not be used for operators such as merge-join and merge-union for which the order guaranteed 
by both inputs must match. Moreover, the paper does not make it clear how the flexible order 
requirements are combined at other joins and group-by operators. Simmen et.al. IISSM96II men- 
tion that the approach of carrying a flexible order specification also increases the complexity 
of the code significantly. Our techniques do not use flexible order specifications and hence can 
be incorporated into an existing optimizer with minimal changes. Further, our techniques work 
uniformly across all types of operators that have a flexible order requirement. Work on inferring 
orders and groupings I1SSM96II IIWC03II llNM04all llNM04bll is independent and complementary 
to our work. 

3 Exploiting Partial Sort Orders 

Often, sort order requirements of operators are partially satisfied by indices or other operators in 
the input subexpressions. A prior knowledge of partial sort orders available from inputs allows 
us to efficiently produce the required (complete) sort order more efficiently. When operators have 
flexible order requirements, it is thus important to choose a sort order that makes maximum use 
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of partial sort orders already available. We motivate the problem with an example. Consider the 
query shown in Example [TJ Such queries frequently arise in consolidating data from multiple 
sources. The join predicate between the two catalog tables involves four attributes and two of 
these attributes are also involved in another join with the rating table. Further, the order-by 
clause asks for sorting on a large number of columns including the columns involved in the join 
predicate. 

Example 1 A query with complex join condition 

SELECT cl.make, cl.year, cl.city, cl. color, cl.sellreason, c2. breakdowns, r.rating 
FROM catalogl cl, catalog! c2, rating r 

WHERE cl .city-c2. city AND cl .make-c2.make AND cl .year-c2.year AND cl .color— c2. color AND 

cl.make—r.make and cl.year— r.y ear 
ORDER BY cl.make, cl.year cl. color cl.city, cl.sellreason, c2. breakdowns, r.rating; 




catalogl catalog2 rating 

C.Idx Scan (50K) Table scan (40K) Table scan (40) 

sort-1 : (y) — > (y, m, c, co) sort-2: (m) — > (y, m, c, co) 
sort-3: ( ) — > (y, m) sort-4: (y, m, c, co) — >(m, y, co, c,...) 




catalogl catalog2 rating 

Table scan (50K) C.Idx Scan(40K) Cov. Idx Scan(lO) 

sort-1: (y) — > (m, y, co, c) sort-2: (m) — > (m, y, co, c) 

soit-3: (m) — > (m, y) sort-4: (m, y, co, c) — > (m, y, co, c, ...) 



Figure 1 : A naive plan Rgurc 2: Optimal merge-join plan 

The two catalog tables contain 2 million records each and have average tuple sizes of 100 
and 80. We assume a disk block size of 4K bytes and 10000 blocks (40 MB) of main memory 
for sorting. The table catalogl is clustered on year and the table catalogl is clustered on make. 
The rating table has a secondary index on the make column with the year and rating columns 
included in the leaf pages (a covering index). Figure [T] and [2] show two different plans for the 
example query. Numbers in the parentheses indicate estimated cost of the operators in number 
of I/Os (CPU cost is appropriately translated into I/O cost units). Edges are marked with the 
number of tuples expected to flow on that edge and their average size. For brevity, the input 
and output orders for the sort enforcers are shown using only the starting letters of the column 
names. Though both plans use the same join order and employ sort-merge joins, the second plan 
is expected to perform significantly better than the first. 



3.1 Changes to External Sort 

External sorting algorithms have been studied extensively but in isolation. The standard replace- 
ment selection ||Knu73ll for run formation well adapts with the extent to which input is presorted. 
In the extreme case, when the input is fully sorted, it generates a single run on the disk and avoids 
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merging altogether. Larson ULarOBII revisits run formation in the context of query processing and 
extends the standard replacement selection to handle variable length keys and to improve local- 
ity of reference (reduced cache misses). Estivill-Castro and Wood [IECW92II provide a survey 
of adaptive sorting algorithms. The technique we propose in this section to exploit partial sort 
orders is a specific optimization in the context of multi-key external sorting. We observe that, 
by exploiting prior knowledge of partial sort order of input, it is possible to eliminate disk I/O 
altogether and have a completely pipelined execution of the sort operator. 
We use the following notations: We use 0,01,02 etc. to refer to sort orders. Each sort or- 
der o is a sequence of attributes/columns (<3i,<32, • • -cin)- We ignore the sort direction (ascend- 
ing/descending) as our techniques are applicable independent of the sort direction. 

• 6 : Empty (no) sort order 

• attrs{o) : The set of attributes in sort order o 

• \o\ : Number of attributes in the sort order o 

• o\ < 02: Order 02 subsumes order o\ {o\ is a prefix of 02) 

• o\ < 02'- Order oi is a strict prefix of 02 

Consider a case where the sort order to produce is (co/i, co/2) and the input already has the 
order {col\). Standard replacement-selection writes a single large run to the disk and reads it 
back again; this breaks the pipeline and incurs substantial I/O for large inputs. It is not difficult 
to see how the standard replacement- selection can be modified to exploit the partial sort orders. 
Let o = (ai, a2, . . . a„) be the desired sort order and o' = (ai,a2, ■ ■ .at), k < n be the partial sort 
order known to hold on the input. At any point during sorting we need to retain only those tuples 
that have the same value for attributes ai,a2, ■ ■ Mk- When a tuple with a new value for these 
set of attributes is read, all the tuples in the heap (or on disk if there are large number of tuples 
matching a given value of a\,a2, . . Mk) can be sent to the next operator in sorted order. Thus 
in most cases, partial sort orders allow a completely pipelined execution of the sort. Exploiting 
partial sort orders in this way has several benefits: 

1. Let o = (ai, a2, . . . a„) be the desired sort order and o' = {a\,a2, . . . au), k <nhQ the partial 
sort order known to already hold on the input. We call the set of tuples that have the same 
value for attributes {ai,a2, . . .a*;) as a partial sort segment. If each partial sort segment 
fits in memory (which is quite often the case in practice), the entire sort operation can be 
completed without any disk I/O. 

2. Exploiting partial sort orders allows us to output tuples early (as soon as a new segment 
starts). In a pipelined execution this can have large benefits. Moreover, producing tuples 
early has immense benefits for Top-K queries and situations where the user retrieves only 
some result tuples. 

3. Since sorting of each partial sort segment is done independently, the number of compar- 
isons are significantly reduced. Note that we empty the heap every time a new segment 
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starts and hence insertions into heap will be faster. In general, independently sorting k seg- 
ments each of size n/k elements, has the complexity 0{n log{n/k)) as against 0{n login)) 
for sorting all n elements. Further, while sorting each partial sort segment comparisons 
need to be done on fewer attributes, {ak+\ , ■ ■ ■ a„) in the above case. 

Our experiments (Section (6]) confirm that the benefits of exploiting partial sort orders can be 
substantial and yet none of the systems we evaluated, though widely used, exploited the partial 
sort orders. 

3.2 Optimizer Extensions for Partial Sort Orders 

In this section we assume order requirements of operators are concrete and only focus on incor- 
porating partial sort orders. We deal with flexible order requirements in subsequent sections. 
We use the following notations: 

• oi A 02 : Longest common prefix between 01 and 02 

• o\+ 02'- Order obtained by concatenating o\ and 02 

• o\- 02'- Order o' such that 02 + 0' = o\ (defined only when 02 < o\) 

• coe{e, 01,02) '■ The cost of enforcing order 02 on the result of expression e which already has order 

oi 

• N{e) : Expected size, in number of tuples, of the result of expression e 

• B{e) : Expected size, in number of blocks, of the result of expression e 

• D{e, s) : Number of distinct values for attribute(s) s of expression e { = NiHsie)) ) 

• cpu_cost{e, o) : CPU cost of sorting result of e to get order o 

• M : Number of memory blocks available for sorting 

The Volcano optimizer framework IIGM93II assumes an algorithm (physical operator) either 
guarantees a required sort order fully or it does not. Further, a physical property enforcer (such 
as sort) only knows the property to be enforced and has no information about the properties that 
hold on its input. The optimizer's cost estimate for the enforcer thus depends only on the required 
output property (sort order). In order to remedy these deficiencies we extended the optimizer in 
the following way: Consider an optimization goal (e, o), where e is the expression and o the 
required output sort order. If the physical operator being considered for the logical operator at 
the root of e guarantees a sort order o' < o, then the optimizer adds a partial sort enforcer en/ to 
enforce o from o' . We use the following cost model to account for the benefits of partial sorting. 



If e is known to have the order oi, we estimate the cost of obtaining an order 02 as follows: 
coe{e, 01,02) = D{e, attrs{Os)) * coe(e', e,o,), where o^ = 02 A o\, o,- = 02 - o^ and e' = o-p(e), 
where p equates attributes in o^ to an arbitrary constant. Intuitively, we consider the cost of 
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sorting a single partial sort segment independently and multiply it by the number of segments. 
Note that we assume uniform distribution of values for attrs{Os). Therefore, we estimate N(e') = 
N{e)/D{e, attrs{Os)) and B{e') = B{e)/D{e, attrs{Os))- When the actual distribution of values is 
available, a more accurate cost model that does not rely on the uniform distribution assumption 
can be used. 

4 Choosing Sort Orders for a Join Tree 

Consider a join expression e = e\ x e2, where e\,e2 are input subexpressions and the join 
predicate is of the form: {ey.ay = e2.a\ and ei.a2 = e2-a2--- and ei.a„ = e2-an)- Note that, 
w.l.g., we use the same name for attributes being compared from either side and we call the set 
{ai,a2, . . . , a„} as the join attribute set. In this case, the merge join algorithm has potentially n\ 
interesting sort orders on inputs ei and e2@- The specific sort order chosen for the merge-join 
can have significant influence on the plan cost due to the following reasons: (?) Clustering and 
covering indices, indexed materialized views and other operators in the subexpressions ei,e2 
can make one sort order much cheaper to produce than another, (ii) The merge-join produces 
the same order on its output as the one selected for its inputs. Hence, a sort order that helps 
another operator above the merge-join can help eliminate a sort or just have a partial sort. In this 
section we show that a special case of the the problem of choosing optimal sort orders for a tree 
of merge-joins is NP-Hard and provide a 2-approximate algorithm for the problem. In the next 
section, we describe our heuristics for a more general setting of the problem in which we make 
use of the proposed 2-approximate algorithm. 

4.1 Finding Optimal is NP-Hard 

Consider a join expression e = Ri \x R2 x R3 ■ ■ ■ Rn and a specific join order tree for the 
expression. Consider a special case where all base relations and intermediate results are of the 
same size and no indices built on the base relations. Now, the problem of choosing optimal sort 
orders for each join requires us to choose permutations of join attributes such that we maximize 
the length of longest common prefixes of permutations chosen for adjacent nodes. Figure |3l 
shows an example and an optimal solution under the model where the benefit for an edge is the 
length of the longest common prefix between the permutations chosen for adjacent nodes and 
we maximize the total benefit. The join attribute set for each join node is shown in curly braces 
besides the node. Permutations chosen in the optimal solution are indicated with angle brackets 
and the number on each edge shows the benefit for that edge. Below we state the problem 
formally. 

Problem 1 Let T be a binary tree of order n, with vertex set V{T) and edge set E{T). Each 
node Vj (i = 1 , . . . n) is associated with an attribute set Sj. Find a sequence of permutations 
P\,P2 - . ■ Pn, where pi is a permutation of set Sj, such that the benefit function T is maximum. 

^We assume merge-join requires sorting on all attributes involved in the join predicate. We do not consider orders 
on subsets of join attributes since the additional cost incurred at merge-join matches the benefit of sorting a smaller 
subset of attributes. 
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/ / \ / \ / \ 

^ \ / \ / \ / \ 

R1 R2 R3 R4 R5 R6 R7 R8 

Total benefit of the optiami solution=8 



Figure 3: A special case of choosing globally optimal sort order 



^^v.VjeEiT) 

Theorem 4.1 Problem\T\is NP-hard. 

Proof: We give a reduction from the known NP-Hard problem SUM-CUT [IDPS02L 

Problem 2 (SUM-CUT) Given a graph G = (V, E) with m = \V\ vertices, number the vertices of 
G as 1, . . .m such that 2i<(<m is minimized, where c, is the number of vertices numbered < i 
that are connected to at least one vertex numbered > i. 

The SUM-CUT problem is equivalent to Problem [2] given below; to see the equivalence 
consider the complement of the graph. 

Problems Given a graph G = (V,E) with m = \V\ vertices, number the vertices of G as 
1, 2, . . . , m such that X !</</» 1i is maximized, where qi is the number of vertices that are adja- 
cent to all the vertices numbered 1 to i. 

We now reduce Problem [3] to Problem [IJ The construction is as follows: 

Let the m vertices of the graph be labeled Ui, . . . , U,ji. We construct the binary tree by choosing 
a vertex set V(T) of size 2m, {vi, V2 . . . , V2m}- vi, . . . v,n are internal vertices and v„,+i, . . . , V2m are 
leaf vertices. Let the edge set E{T) be {v,v,+i : 1 < i < m} [j{viVm+i : 1 < z < m). Attribute set 
Si of internal vertex v, is V{G) U X, where X is an arbitrarily large set disjoint from V{G), for 
1 < / < m. Attribute set Si of leaf vertex v,- is {w : w e V{G) and wm,_,„ e E{G)}, for m < i < 2m. 
□ 
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4.2 A 2-Approximate Algorithm 

An efficient dynamic programming based aigoritlim to find tlie optimai soiution (under tlie benefit 
modei presented in tlie previous section) for Problem [H exists when the tree is a path. Note that 
left-deep and right-deep join plans result in paths. We first present this algorithm and make use 
of it in finding a 2-approximation for binary trees. 

Consider a path Vi, V2, . . .v„, where each vertex v, is associated with an attribute set 5,. The 
optimal solution for any segment (/, j) of the path, OPT(/, j) = max{ OPT(/, ^)+OPT(^ + 1, j) + c{i, j) 
} over all i < k < j, where c{i, j) is the number of common attributes for the segment (i, j). 

The dynamic programming algorithm starts by finding the optimal solution for all segments 
of size 1, then of size 2 and so on. At any segment (z, j), every possible way of splitting the 
segment into smaller segments is considered and the solution for the smaller segments obtained 
from the memo structure. The optimal solution for segment (/, j) is then memoized. Procedure 
PathOrder in Figure |4] gives the details. 

For binary trees we propose an approximation with benefit at least half that of an optimal 
solution. We split the tree into two sets of paths, and Pg. P^ has the paths formed by edges 
at odd levels and P^ has those formed by edges at even levels. Figure [5] shows an example. We 
use the procedure PathOrder to find optimal solutions for each of the two sets of paths. Let 
the the optimal solutions for the two sets of paths be So and Se and the corresponding benefits 
be ben(So) and ben(S e)- Let the set of edges included in Po and Pe be denoted by Eo and Eg 
respectively. Consider an optimal solution S j for the whole tree. In the optimal solution, let the 
sum of benefits of all edges in be odd-ben(S j) and that of edges in E^ be even-ben(S j)- Note 
that ben(So) > odd-ben(S t) and ben(Sg) > even-ben(S t)- Since the total benefit of the optimal 
solution ben(ST) = odd-ben(S t) + even-ben(S t), we have ben(S „) + ben(S e) > ben(ST)- Hence 
at least one of ben(So) or ben(S g) is > 1/2 beniSj)- There may be vertices not included in the 
chosen solution, e.g., the even level split in Figure [5] does not include the root and leaf nodes. 
For these left over vertices arbitrary permutations can be chosen. 

5 Optimization Exploiting Favorable Orders 

The benefit model we presented in the previous section and the approximation algorithm do not 
take into account indices and size of relations or intermediate results. Moreover, we assumed 
that the join order tree is fixed. In this section we present a two phase approach for the more 
general problem. In phase- 1, which occurs during plan generation, we exploit the information 
of available indices and properties of physical operators to efficiently compute a small set of 
promising sort orders to try. We formalize this idea through the notion of favorable orders. 
Phase-2, is a plan refinement step and occurs after the optimizer makes its choice of the best 
plan. In phase-2, the sort orders chosen by the optimizer are refined further to reap extra benefit 
from the attributes common to multiple joins. Phase-2 uses the 2-approximate algorithm of 
Section 1121 
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Input: 

s[n] : array of attribute sets 
Output: 

p[n] : array of permutations or orders 
Data Structures: 

benefit[n][n], split[n][n] : arrays of integers 
commons[n][n] : array of attribute sets 

apermute(s) : Function tliat returns an arbitrary permutation of attribute set s 
Procedure PatliOrder 
BEGIN 

for i=1 to n 

benefit[i][i] = 0; commons[i][i] = s[i]; split[i][i] = -1 ; 
for j=1 to n-1 

for i = 1 to n-j 

Let k be tlie index sucli tliat i < k < (i+j) and benefit[i][k]+benefit[k+1][i+j] is maximum 
commons[i][i+j] = commons[i][k] n commons[k+1][i+j]; 
benefit[i][i+j] = benefit[i][k] + benefit[k+1][i+j] + |commons[i][i+j]|; 
split[i][i+j] = k; 
Call MakePermutation(1, n); 
END PROC 

Procedure MakePermutation(i, j) 
BEGIN 

if (i = j) 

p[i] = Append apermute(commons[i][i]) to p[i]; 
return; 
for k=i to j 

p[k] = Append apermute(commons[i][j]) to p[k]; 
Forall(i\j')^(i,j) 

commons[i']0'] = commons[i'][j'] - commons[i]0]; 
m = split[i]0]; 
l\/lakePermutation(i, m); 
l\/lakePermutation(m+1 , j); 
END PROC 



Figure 4: Optimal Orders for Path 

5.1 Favorable Orders 

Given an expression e, we expect some sort orders (on the result of e) to be producible at much 
lesser cost than other sort orders. Available indices, indexed materialized views, specific rewrit- 
ing of the expression and choice of physical operators determine what sort orders are easy to 
produce. To account for such orders, we introduce the notion of favorable orders. We use the 
following notations: 

• cbp{e, o) : Cost of the best plan for expression e with o being the required output order 

• Or : The clustering order of relation R 

• idx{R) : Set of all indices over R 

• o{I) : Order (key) of the index / 
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\ / \ 



Paths of odd levels 



Paths of even levels 



Figure 5: A 2-approximation for binary trees 



• (s) : An arbitrary order (permutation) on attribute set s 

• Pis) : Set of all permutations of set s 

• OAs: Longest prefix of o such that each attribute in the prefix belongs to the attribute set s 

• schema{e) : The set of attributes in the output of e 

We first define the benefit of a sort order ow.r.t. an expression e as follows: 
benefit(o, e) = cbp(e, e) + coe(e, e, o) - cbp(e, o) 

Intuitively, a positive benefit implies the order can be obtained with lesser cost than a full sort 
of unordered result. For instance, the clustering order of a relation r will have a positive benefit 
for the expression o-p(r). Similarly, query covering secondary indices and indexed materialized 
views can yield orders with positive benefit. We call the set of all orders, on schema(e), having a 
positive benefit w.r.t. e as the favorable order set of e and denoted it as ford(e). 

ford(e)= { o: benefit(o, e)> 0} 
5.1.1 Minimal Favorable Orders 

The number of favorable orders for an expression can be very large. For instance, every order 
having the clustering order as its prefix is a favorable order. A minimal favorable order set of 
e, denoted hy ford-min(e), is the minimum size subset of ford(e) such that, for each order o e 
ford(e), at least one of the following is true: 

1 . o belongs to ford-min( e ) 

2. 3 o' eford-min(e) such that o' < o and cbp(e,o') + coe(e,o',o) = cbp{e,o) Intuitively, if 
the cost of obtaining order o equals the cost of obtaining a partial sort order o' followed by 
an explicit sort to get o, we include only o' in the ford-min 

3. 3 o" & ford-min(e) such that o < o" and cbp{e,o") = cbp(e,o) Intuitively, if an order o" 
subsumes order o and has the same cost, we include only o" in ford-min 

Conditions 2 and 3 above, ensure that when a relation has an index on o, orders that are 
prefixes of o and orders that have o as their prefix are not included unnecessarily. 
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We define favorable orders of an expression w.r.t. a set of attributes s as: ford(e, s)= {o A s: 
o eford(e)} Intuitively, /orJfe, s) is, the set of orders on 5 or a subset of s that can be obtained 
efficiently. Similarly, the ford-min of an expression w.r.t. a set of attributes s is defined as: ford- 
min(e, s)= {o A s : o eford-min(e)} 

5.1.2 Heuristics for Favorable Orders 

Note that the definition of favorable orders uses the cost of the best plan for the expression. 
However, we need to compute the favorable orders of an expression before the expression is 
optimized and without requiring to expand the logical or the physical plan space. Further, the 
size of the exact ford-min of an expression can be prohibitively large in the worst case. In this 
section, we describe a method of computing approximate ford-min, denoted as afin, for SPJG 
expressions. We compute the afm of an expression bottom-up. For any expression e, afm(e) is 
computable after the afm is computed for all of e's inputs. 

1. e = R, where i? is a base relation or materialized view. We include the clustering order of 
R and all secondary index orders such that the index covers the query. 

afm(R) = {o : o = Or or o = o(I), I 6 idx{R) and / covers the query} 

2. e = o-p{ei), where ei is an arbitrary expression. 
afm{e) = {o : o 6 afm(ei) } 

3. e = Ili(ei), where ei is any expression. We include longest prefixes of input favorable 
orders such that the prefix has only the projected attributes. 

afm{e) = {o : 3o' e afm(e\) and o = o' A L} 

4. e = ei M ^2 with join attribute set S = {a\,a2, ■ ■ .an). Noting that nested loops joins 
propagate the sort order of one of the inputs (outer) and merge join propagates the sort 
order chosen for join attributes, we compute the afm as follows. First, we include all sort 
orders in the input afms. Next, we consider the longest prefix of each input favorable 
order having attributes only from the join attribute set and extend it to include an arbitrary 
permutation of the remaining join attributes. 

afm(ei x 62) = T\J {o : o' e T\J{e} and o = o' A S + {S-attrs(o' A 5))}, where 
T =afm{ei) \J afm{e2) 

Note that, for the join attributes not involved in an input favorable order prefix {i.e., 
S -attrs{o' A S )), we take an arbitrary permutation. An exact ford-min would require us to 
include all permutations of such attributes. In the post-optimization phase, we refine the 
choice made here using the benefit model and algorithm of Section 

5. e =L Qpiei) 

afm{e) = {o : o' e afin(ei)[J{e} and o = o' A L {L-attrs{o' A L))} 
Intuitively, for each input favorable order we identify the longest prefix with attributes 
from the projected group-by columns and extend the prefix with an arbitrary permutation 
of the remaining attributes. 
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Computing afms requires a single pass of the query tree. At each node of the query tree the 
only significant operation performed is computation of the longest common prefix Icp of an order 
w.r.t. a set, i.e., o A s. The number of Icp operations performed at a node is at most the number 
of input favorable orders for that node. Let m be the average number of favorable orders for 
base relations. If the query involves a join of n relations the worst-case number of Icp operations 
performed at any node is 0(2"m). Typically m is very small (< 2) and hence even the worst- 
case number of Icp operations is within acceptable limits. However, in most practical cases, the 
number of Icp operations performed is far below the worst-case. 

5.2 Overall Optimizer Extensions 

We make use of the approximate favorable orders during plan generation (phase- 1) to choose 
a small set of promising interesting orders for sort-based operators. We describe our approach 
taking merge join as an example but the approach is applicable to other sort based operators. In 
phase-2, which is a post-optimization phase, we further refine the chosen sort orders. 

5.2.1 Plan Generation (Phase-1) 

Consider an optimization goal of expression e = ei x e,- and required output sort order o. When 
we consider merge-join as a candidate algorithm, we need to generate sub-goals for e; and e,. 
with the required output sort order being some permutation of the join attributes. 

Let S be the set of attributes involved in the join predicate. We consider only conjunctive and 
equality predicates. We compute the set I(e, o) of interesting orders as follows: 

1 . Collect the favorable orders of inputs plus the required output order 

7~(e,o) =afm{ei,S) U afm{er,S) [J o A S , where afm{e,S) = {o' AS : o' e afm{e)} 

2. Remove redundant orders 

If Oi, 02 s T{e, o) and oi < 02, remove oi from T{e, o) 

3. Compute the set I{e, o) by extending each order in T{e, o) to the length of |5 1; the order of 
extra attributes can be arbitrarily chosen 

I{e, o) = {o : o' e T{e, o) and = 0' + {S - attrs{o'))} 

We then generate optimization sub-goals for ej and e,. with each order o' e J(e, o) as the 
required output order and retain the cheapest combination. 

A Note on Optimality: If the set J(e, o) is computed using the exact ford-mins instead of afms, 
we claim that it must contain an optimal sort order (a sort order that produces the optimal merge 
join plan in terms of overall plan cost). Appendix lAl gives the detailed proof of this claim. 
An Example: Consider Example[T]of Section[3l For brevity, we refer to the two catalog tables as 
ctl and ct2, the rating table as rt and the columns with their starting letters. The afms computed 
as described in Section [5. 1.21 are as follows: afm{ct\) = {(y)}, afin(ct2) = {(m)}, afin(rt) = {(m)}, 
afm{ct\ xi ct2) = {{y,co,c,m),{m,co,c,y)}, afm{{ct\ x ctl) x rt) = {(y,m),{m,y)} 

For (ctl IX ctl) M rt we consider two interesting sort orders {(y, m), (m,y)} and for ctl ix ctl 
we consider the four orders {(y, co, c, m), (m, co, c, y), (y, m, co, c), (m, y, co, c)}. As a result the 
optimizer arrives at plan shown in Figure [2l 
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5.2.2 Plan Refinement (Phase-2) 

During the plan refinement phase, for each merge-join node in the plan tree, we identify the 
set of free attributes, the attributes which were not part of any of the input favorable orders. 
Note that for these attributes we had chosen an arbitrary permutation while computing the afm 
(Section [5. 1.21) . We then make use of the 2-approximate algorithm for trees (Section |431) and 
rework the permutations chosen for the free attributes. 

Formally, let pj be the permutation chosen for the join node v, . Let qj be the order such that 
qi € afin(vi. left-input) U afm{vi. right-input) and \pi A is the maximum. Intuitively, is the 
input favorable order sharing the longest common prefix with p,. Let f =attrs(pi - {pi A qt)); f 
is the set of free attributes for v,. 

We now construct a binary tree where each node 77, corresponding to join-node v, is associated 
with the attribute set f. The orders for the nodes are chosen using the 2-approximate algorithm; 
the chosen order for free attributes is then appended to the order chosen during plan generation 
(i.e., Pi A qi) to get a complete order. 

The reworking of the orders will be useful only if the adjacent nodes share the same prefix, 
i.e.. Pi A qi was the same for adjacent nodes. This condition however certainly holds when the 
inputs for joins have no favorable orders. 

Figure [6] illustrates the post-optimization phase. Assume all relations involved (Ri . . .R4) 
are clustered on attribute a and no other favorable orders exist, i.e., afm{Ri) = {(a)}, for i = 1 
to 4. The orders chosen by the plan generation phase are shown besides the join nodes with 
free attributes being in italics. The reworked orders after the post-optimization phase are shown 
underlined. 



6 Experimental Results 

We performed experiments to evaluate the benefits our techniques. For comparison, we use 
PostgreSQL (version 8.1.3) and two widely used commercial database systems (we call them 
SYSl and SYS2). All tests were run on an Intel P4 (HT) PC with 512 MB of RAM. We used 
TPC-H 1GB dataset and additional tables as specified in the individual test cases. For each table, 
a clustering index was built on the primary key. Additional secondary indices built are specified 




Figure 6: Post-Optimization Phase 
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along with the test cases. All relevant statistics were built and the optimization level for one of 
the systems, which supports multiple levels of optimization, was set to the highest. 



6.1 Modified Replacement Selection 

The first set of experiments evaluate the benefits of modified replacement selection (MRS) as 
compared to the standard replacement selection (SRS) when the input is known to be partially 
sorted. 

External sort in PostgreSQL employs the standard replacement selection (SRS) algorithm 
suitably adapted for variable length records. We modified this implementation to exploit partial 
sort orders available on the input. 

Experiment Al: The first experiment consists of a simple ORDER BY of the TPC-H lineitem 
table on two columns (Isuppkey, l-partkey). 

Query 1 ORDER-BY on lineitem 

SELECT l^uppkey, l^partkey FROM lineitem ORDER BY ljuppkey, l^artkey; 

A secondary index on l_suppkey was available that covered the query (included the l_partkey 
column]^. On all three systems, the order by on (l_suppkey, Lpartkey) took almost the same 
time as an order by on (Lpartkey, Isuppkey) showing that the sort operator of these systems 
did not exploit partial sort orders effectively. We then compared the running times with our 
implementation that exploited partial sort order (l^uppkey) and the results are shown in Figure |7l 

On SYSl and SYS2 we simulated the partial sorting using a correlated rank query (as we did 
not have access to their source code). The subquery sorted the index entries matching a given 
Lsuppkey on l^iartkey and the subquery was invoked with all suppkey values so as to obtain the 
desired sort order of (lsuppkey, Lpartkey). 
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''On systems not supporting indexes with included columns, we used a table with only the desired two columns, 
clustered on ljuppkey 
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select * from R order by c1 , c2; (R pre-sorted on c1 ) 
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Figure 9: Effect of Partial Sort Segment Size 



By avoiding run generation 1/0 and making reduced comparisons, MRS performs 3-4 times 
better than SRS. 

Experiment A2: The second experiment shows how MRS is superior in terms of its ability to 
produce records early and uniformly. Table Rt, having 3 columns (cl , c2, c3) was populated with 
10 million records and was clustered on (cl). The query asked an order by on (cl, c2). Figure[8] 
shows the plot of number of tuples produced vs. time with cardinality of cl = 10, 000. 

MRS starts producing the tuples without any delay after the operator initialization where as 
SRS produces its first output tuple only after seeing all input tuples. By producing tuples early, 
MRS speeds up the pipeline significantly and also helps Top-K queries. 

Experiment A3: The third experiment shows the effect of partial sort segment size on sorting. 
8 tables . . .Rj, with identical schema of 3 columns (cl, c2, c3) were each populated with 10 
million records and average record size of 200 bytes. Each table was clustered on (cl). Table 
Ri had 10' tuples for each value of cl, resulting in a partial sort segment size of 200 x 10' bytes. 
Thus Rq had cl as unique and sort segment size of 200 bytes and Rj had the same value of cl for 
all 10 million records leading to a sort segment size of 2GB. The query asked for an order by on 
(cl, c2). The running times with default and modified replacement selection on PostgreSQL are 
shown in Figured 

When the partial sort segment size is small enough to fit in memory (up to 10MB or 50K 
records), SRS produces a single sorted run on disk and does not involve merging of runs. The 
modified replacement selection (MRS) gets the benefit of avoiding I/O and reduced number of 
comparisons. When the partial sort segment size becomes too large to fit in memory, we see a 
sudden rise in the time taken by SRS. This is because replacement selection will have to deal with 
merging several runs. MRS however deals with merging smaller number of runs initially as each 
partial sort segment is sorted separately. As the partial sort segment size increases, the running 
time of MRS rises and becomes same as that of SRS at the extreme point where all records have 
the same value for cl. 
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Experiment A4: To see the influence of MRS on a query having other operators, we considered 
a query that asked for counting the number of lineitems for each supplier, part pair. Two indices, 
lineitem(l_suppkey) and partsupp(ps_suppkey), each of which included other required columns 
supplied the required sort order partially. 

Query 2 Number of lineitems for each (supplier, part) pair 

SELECT psjuppkey, ps^artkey, psjavailqty, count(l^artkey) 
FROM partsupp, lineitem 

WHERE psjuppkey—ljuppkey AND ps^artkey—l^artkey 
GROUP BY psjuppkey, ps^artkey, psMvailqty 
ORDER BY psjuppkey, ps^artkey; 

The query took 63 seconds to execute with SRS and 25 seconds with MRS, both on Postgres. 
The query plan used in both cases was the same - a merge join of the two relations on (suppkey, 
partkey) followed by an aggregate. 

6.2 Choice of Interesting Orders 

We extended our Volcano-style cost based optimizer, which we call PYRO, to consider partial 
sort orders and choose good interesting sort orders for merge joins and aggregation. We com- 
pare the plans produced by the extended implementation, which we call PYRO-0, with those of 
Postgres, SYSl and SYS2. 

Experiment Bl: For this experiment we used Query [3] given below, which lists parts for which 
the outstanding order quantity is more than the stock available at the supplier. 

Query 3 Parts Running Out of Stock 

SELECT psjuppkey, ps^partkey, psjavailqty, sum(ljquantity} 
FROM partsupp, lineitem 

WHERE psjuppkey—ljuppkey AND ps partkey— I partkey AND IJinestatus— 'O ' 

GROUP BY psMvailqty, ps^partkey, psjuppkey 

HAVING sum(ljquantity) > psMvailqty ORDER BY ps-partkey; 

Table partsupp had clustering index on its primary key (ps partkey, ps suppkey). Two sec- 
ondary indices, one on ps_suppkey and the other on Lsuppkey were also built on the partsupp 
and lineitem tables respectively. The two secondary indices covered all attributes needed for the 
query. The experiment shows the need for cost-based choice of interesting orders. The choice 
of interesting orders for the join and aggregate are not obvious in this case for the following 
reasons: 

1 . The explicit order by clause favors the choice of of an order where partkey appears first. 

2. The clustering index on partsupp favors the choice of (partkey, suppkey) as the interesting 
order. 
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3. The secondary indices favor the choice of (suppkey, partkey) that can be obtained by using 
a low cost partial sort. Note that this option can be much cheaper due to the size of the 
lineitem relation. 



Therefore, the optimizer must make a cost-based decision on the sort order to use. Figures 
[lO] and [m show the plans chosen by Postgres, PYRO-0, SYSl and SYS2. 

All plans except the hash-join plan of SYS 1 and the plan produced by PYRO-0 use an ex- 
pensive full sort of 6 million lineitem index entries on (partkey, suppkey). Further, Postgres uses 
a hash aggregate where a sort-based aggregate would have been much cheaper as the required 
sort order was available from the output of merge-join (note that the functional dependency 
{ps_partkey, ps_suppkey} — > {ps_availqty} holds). On SYSl, it was possible to force the use of a 
merge-join instead of hash-join and the plan chosen is shown in Figure [TTTb). 
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(a) Plan Generated by PYRO-O 



Figure 10: Plans for Query 3 (Postgres and PYRO-0) 



We compared the actual running time of PYRO-O's plan with those of Postgres and SYSl 
by forcing our plan on the respective systems. Figures [l2l and [13] show the details. It was not 
possible for us to force our plan on SYS2 and make a fair comparison and hence we omit the 
same. The only surprising result was the default plan chosen by SYSl performed slightly poorer 
than the forced merge-join plan on SYSl. In all cases, the forced PYRO-0 plan performed 
significantly better than the other plans. The main reason for the improvement was the use of a 
partial sort of lineitem index entries as against a full sort. The final sort on partkey was not very 
expensive as only a few tuples needed to be sorted. 

For Query [3] the plan generation phase (phase- 1) was sufficient to select the sort orders and 
phase-2 does not make any changes. We shall now see a case for which phase- 1 cannot make a 
good choice and the sort orders get refined by phase-2. 

Experiment B2: This experiment uses Query IH shown below, which has two full outer joins 
with two common attributes between the joins. We designed this experiment to see whether the 
systems we compare with exploit attributes common to multiple sort-based operators. 

Query 4 Attributes common to multiple joins 
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Figure 11: Plans for Query 3 (SYSl and SYS2) 
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Figure 13: SYSl 



SELECT * FROMRl FULL OUTER JOIN R2 

ON (RLc5=R2.c5 AND RLc4=R2.c4 AND RLc3^R2.c3) 
FULL OUTER JOIN R3 

ON (RS.cl^Rl.cl AND R3.c4^Rl.c4 AND R3.c5^Rl.c5); 



The tables Rl, R2 and R3 were identical and each populated with 100,000 records. No 
indexes were built. As shown in Figure [T4T a), both SYSl and Postgres chose sort orders that do 
not share any common prefix. The plan chosen by PYRO-0 is shown in Figure [T4lb). In the plan 
chosen by PYRO-0, the two joins share a common prefix of (c4, c5) and thus the sorting effort 
is expected to be significantly less. SYS2, not having an implementation of full outer join, chose 
a union of two left outer joins. The two left outer joins used to get a full outer join used different 
sort orders making the union expensive, illustrating a need for coordinated choice of sort orders. 
Experiment B3: In this experiment we compare our approach of choosing interesting orders, 
PYRO-0, with the exhaustive approach, and a heuristic used by PostgreSQL. Postgres uses the 
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(a) On SYSl and Postgres 



(b) On PYRO-O 



Figure 14: Plans for Query 4 

following heuristic: for each of the n attributes involved in the join condition, a sort order be- 
ginning with that attribute is chosen; in each order, the remaining n - I attributes are ordered 
arbitrarily. We implemented Postgres' heuristic in PYRO along with the extensions to exploit 
partial sort orders and call it PYRO-P The exhaustive approach, called PYRO-E, enumerates 
all n\ permutations and considers partial sort orders. In addition, we also compare with PYRO, 
which chooses an arbitrary sort order, and a variation of PYRO-O, called PYRO-O that con- 
siders only exact favorable orders (no partial sort). Figure \T5\ shows the estimated plan costs. 
Note the logscale for y-axis. The plan costs are normalized taking the plan cost with exhaustive 
approach to be 100. In the figure, Q3 and Q4 are Query [3] and Query |4] of Experiments Bl and 
B2. Q5 and Q6 are Queries [5] and [6] shown below. For Q3 and Q4, as very few attributes 
were involved in the join condition, Postgres' heuristic along with extensions to exploit partial 
sort orders, produced plans which were close to optimal. However, for more complex queries the 
heuristic does not perform as well since it makes an arbitrary choice for secondary orders. 

Query 5 Total value executed for a given order 

SELECT Tl.Userld, Tl.BasketId, Tl.ParentOrderld, Tl.Waveld, Tl.ChildOrderld, 

(Tl. Quantity * Tl. Price) as OrderValue, SUM(T2. Quantity * T2. Price) as ExecutedValue 
FROM TRAN Tl, TRAN T2 

WHERE Tl.UserId^T2.UserId AND Tl.ParentOrderId^T2.ParentOrderIdAND Tl.BasketId^T2.BasketId 
AND Tl.WaveId^T2.WaveId AND Tl.ChildOrderId^T2.ChildOrderIdAND Tl.TranType^'New' 
AND T2.TranType—' Executed' 

GROUP BY Tl.Userld, Tl.BasketId, Tl.ParentOrderld, Tl.Waveld, Tl.ChildOrderld; 

Query 6 Basket Analytics 

SELECT * FROM BASKET B, ANALYTICS A 

WHERE B.ProdType = A.ProdType AND B.Symbol = A.SymbolAND B.Exchange = A.Exchange; 
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6.3 Optimization Overheads 

The optimization overheads due to the proposed extensions were negligible. During plan gen- 
eration, the number of interesting orders we try at each join or aggregate node is of the order 
of the number of indices that are useful for answering the query. We found that in most real- 
life queries this number is fairly small. Figure [H] shows the scalability of the three heuristics. 
For the experiment a query that joined two relations on varying number of attributes was used. 
Though PYRO-P and PYRO-0 take the same amount of time in this experiment, in most cases, 
the number of favorable orders is much less than the total number of attributes involved and 
hence PYRO-0 generates significantly fewer interesting orders than PYRO-P. 

The plan-refinement algorithm presented in Section I4~2l was tested with trees up to 31 nodes 
(joins) and 10 attributes per node. The time taken was negligible in each case. The execution of 
plan refinement phase took less than 6 ms even for the tree with 31 nodes. 

Both the optimizer extensions and the extension to external-sorting (MRS) were fairly straight 
forward to implement. The optimizer extensions neatly integrated into our existing Volcano style 
optimizer. 



7 Conclusion and Future Work 

We addressed the issue of choosing interesting sort orders. We showed that even a simplified 
version of the problem is NP-hard and proposed principled heuristics for choosing interesting 
orders. Our techniques take into account important issues such as partially matching sort orders 
and operators that require matching sort orders from multiple inputs. We presented detailed 
experimental results to demonstrate the benefits due to our techniques. 

Secondary indices that do not cover the query can also be efficient to obtain a desired sort 
order if the actual tuple fetch can be deferred. Deferring the fetch until a point where the extra 
attributes are actually needed can be very effective when a highly selective filter discards many 
rows before the fetch is needed. Such optimizations must be made in a cost-based manner. 

Next, we would like to investigate additional ways of speeding up nested iteration involving 



21 



user-defined functions. Tliougli decorrelation is not always applicable for user-defined functions, 
we expect many cases that can be unnested using a combination of dataflow analysis and known 
unnesting methods. 

In many cases users do not retrieve the whole of the query result but need only the first few 
tuples. In Top-K queries this interest is explicitly stated and the query optimizers attempt to find 
a plan that is optimal for returning the first K results. However, there are cases where the value 
of K is not known a priori. The user may decide to terminate the query execution at an arbitrary 
point, perhaps based on the results seen so far. We would like to study plans that adapt with the 
number of rows retrieved. 
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A Optimality with Exact Favorable Orderes 



Theorem A.l The set I{e, o) computed with exact ford-mins contains an optimal sort order Op 

for the optimize goal e = (ei t>< e^) with (o) as the required output sort order 

Assumption: 

Ifoi, 02 are two orders such that attrs{o\) =attrs(o2), cpu-cost{e,0\) =cpu-cost{e,02) 

Proof: We define the plan cost of an order o' for the optimization goal (e = ei x Cr, o), where o 
is the required output order as follows: 

PC{e, o, 0')= cbp(ei, o') + cbp{er, o') + coe{e, o' , o) + CM, where CM is the cost of the merge 
join operator, which we assume to be same for all o' such that attrs{o') = S , where S is the set 
of attributes involved in the join predicate. 

Let Oh be an optimal order not belonging to set 1(e). We show that 3op e 1(e) such that 
PC(Op) =PC(ob) 

Case 1: Ob i ford(ei)[J ford(e,) 

Let Op be an order e 1(e) such that o A 5 < Op. Note that there must exist such an order in 1(e) 
since o AS £T (Step 1 in computing 1(e)) 

PC(Oh) = cbp(ei. Oh) + cbp(er, Ob) + coe(e, Ob, o) + CM 
Since, Ob i ford(ei) U ford(e,) we can write 
PC(oh) = cbp(ei, e) + coe(ei, e, Oh) + cbp(er, e)+ 

coe(er, e, Ob) + coe(e, Ob, o) + CM 
Since attrs(ob)=attrs(Op), coe(ex, e, Ob) = coe(ex, e, Op). Hence, 
PC(ob) = cbp(ei, e) + coe(ei, e, Op) + cbp(e,-, e)+ 
coe(er, 6, Op) + coe(e, Ob,o) + CM 

> cbp(ei. Op) + cbp(er, Op) + coe(e, Ob, o) + CM 
Further, Ob f\ o < Op l\ o since o A 5 < Op and Ob is a permutation 
of S . Hence, 

PC(Ob) > cbp(ei. Op) + cbp(e,., Op) + coe(e, Op, o) + CM 
PC(ob) >PC(Op). 

Case 2: Oh 6 ford(ei) \J ford(er) 

Case 2A: Ob e ford(ei) OR Oh e ford(e,) but not both. Without loss of generality assume 
Ob 6 ford(ei). This implies one of the following: 

1. 3o' eford-min(ei) such that Ob < o' 

o' eford-min(ei) => 3op e 1(e) such that o' A 5 < Op 
Since Ob is a permutation of S we have A 5 = Ob 
Therefore o' A S = Ob and hence Ob <= Op 

Further, since \ob\ = \Op\, we conclude Ob = Op. This contradicts our earlier assumption that 
Ob i 1(e) 
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2. 3o' eford-min(ei) such that o' < Ob and cbp(ei, o') + coe{ei, o', Ob) = cbp{ei, Ob) 
o' eford-min(ei) => 3op G 1(e) such that o' AS < Op 

PC(e, Oh) = cbp{ei, Ob) + cbp(er, Ob) + coe(e, Ob, o) + CM 
= cbp(ei, o') + coe(ei, o', Ob) + cbp(er, Op)+ 
coe{e, Ob,o) + CM 
(Note: cbp{er,Ob) = cbp{er,Op) as Ob i ford{er)) 
Since attrs{Ob) = attrs{Op) we can write 
PC{e,Ob) = cbp{ei,o') + coe{ei,o'Op) + cbp{er,Op)+ 
coe{e, Oh,o) + CM 
> cbp(ei. Op) + cbp(er, Op) + coe(e, Ob, o) + CM 

case (a) o' <o 

Then we could choose Op from I{e) such that 
o < Op. Hence, coe(e,Ob,o) > coe(e,Op,o) 
PC(e, Ob) > cbp(ei, Op) + cbp{er, Op) + coe{e, Op, o) + CM 
>PC(e,Op) 

case (b) o' ^ o 

Now, o' A o = Ob A o = Op A o (".■ o' < Ob and o' < Op) 
PC(e, Ob) > cbp(ei, Op) + cbp(er, Op) + coe(e, Op, o) + CM 
>PC(e,Op) 



Case 2B: Ob e ford(ei) as well as Ob e ford(er) 
This implies one of the following: 

1. 3o' eford-min(ei) (or ford-min{er)) such that Ob < o' 
In this case the proof can proceed as in Case 2A(1). 

2. 3oi,02 e ford-min{ei) such that o, < Ob and cbp{ei,Oi) + coe{ei,Oi,Ob) = cbp{ei,Ob) (for 
both i = 1 and i = 2) 

Since oi < Ob and 02 < Ob, either o\ < 02 or 02 < o\. Hence, 3op € I{e) such that 
0\ AS < Op and 02 AS < Op 

Choosing this Op and proceeding as in Case 2A (2) we can prove PC{e, Ob) > PC(e, Op) 
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